07. Quiz: Q-Learning

Quiz: Q-Learning

Say that an agent is learning to navigate the gridworld described earlier in the lesson.

Gridworld Example

Gridworld Example

Suppose the agent is using Q-Learning in its search for the optimal policy, with \alpha=0.1.

At the end of the 99th episode, the Q-table has the following values:

Q-table

Q-table

Say that at the beginning of the 100th episode, the agent starts in state 1 and selects action right. As a result, it receives reward -1, and the next state is state 2.

Beginning of the 100th episode

Beginning of the 100th episode

In the previous video, you learned that at this point in time, the agent updates the Q-table.

Which entry in the Q-table is updated?

SOLUTION: The entry corresponding to **state 1** and **action right**.

What is the new value in the Q-table corresponding to the state-action pair you selected in the answer to the question above?

(Suppose that when selecting the actions for the first two timesteps in the 100th episode, the agent was following the epsilon-greedy policy with respect to the Q-table, with epsilon = 0.4.)

SOLUTION: 6.2